Segmentation-based Feature Selection for Text Categorization
نویسندگان
چکیده
Text categorization is an interesting problem in artificial intelligence that gets more and more attention from researchers and industry. One central problem of text categorization is the selection of a good feature set. We propose a novel method for term selection for each category based on segmenting the documents belonging to a category into cohesive sub-parts that define the subtopics of the document. Next we cluster these segments and use the terms found in the biggest segment cluster for each category. We compare the performance of our method with a very efficient ranking technique (χ) and find it very similar.
منابع مشابه
Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملA multi-criteria decision making approach in feature selection for enhancing text categorization
This paper considers the problem of feature selection in text categorization. Previous works in feature selection often used a filter model in which features, after ranked by a measure, are selected based on a given threshold. In this paper, we present a novel approach to feature selection based on multi-criteria decision making of each feature. Instead of only one criterion, multi-criteria of ...
متن کاملMMR-based Feature Selection for Text Categorization
We introduce a new method of feature selection for text categorization. Our MMR-based feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results show that MMR-based feature selection is more effective than Koller & Sahami’s method, which is one of greedy feature selection ...
متن کاملA ME Model Based on Feature Template for Chinese Text Categorization
With entering into information society and the Internet developing rapidly, people could acquire more and more information. How to utilize Internet information efficiently and promptly, has became a hotspot in information technology. Text categorization is an important component to help getting useful message from tremendous amount of vast information. And it assigns new documents to pre-define...
متن کاملSemi Automated Text Categorization Using Demonstration Based Term Set
Manual Analysis of huge amount of textual data requires a tremendous amount of processing time and effort in reading the text and organizing them in required format. In the current scenario, the major problem is with text categorization because of the high dimensionality of feature space. Now-a-days there are many methods available to deal with text feature selection. This paper aims at such se...
متن کامل